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A  CASE  STUDY  OF  SCALING  PROBLEM  IN  SHIP  CLASSIFICATION 


I 


1.  Introduction 

In  the  domain  of  ship  classification  there  are  potentially  hundreds  of  candidate  tar¬ 
gets  that  can  be  observed.  In  the  past  several  pilot  studies  [Booker  1988][Musman, 
Chang  &  Booker  1993]  have  demonstrated  the  feasibility  and  applicability  of  using 
Bayesian  belief  networks  to  solve  the  ship  classification  problem.  However,  that  work 
only  demonstrated  small  examples  of  how  the  problem  can  be  solved.  In  both  of  the 
above  cases  the  networks  compared  only  10-12  target  types. 

Because  of  the  large  number  of  targets  that  are  present  in  the  ship  classification 
problem,  there  are  potential  difficulties  and  pitfalls  which  exist  when  trying  to  scale  up 
the  example  networks  shown  in  the  pilot  studies  to  create  a  system  which  is  capable  of 
classifying  the  thousands  of  ship  targets  present  in  the  world.  At  the  moment  there  are 
more  than  640  military  combatant  ship  classes  in  the  world  and  there  are  over  10,000 
types  of  Commercial  and  Auxiliary  craft.  In  our  work  we  have  focused  on  the  task  of 
identifying  only  the  combatant  targets. 

In  order  to  be  able  to  address  such  a  large  class  problem  we  use  the  same  coarse  to 
fine  hierarchical  classification  techniques  described  by  [Qancy  1984]  [  Chandrasekaran 
1986]  by  defining  a  taxonomy  for  the  ship  classification  problem.  In  addition  to  this, 
because  of  the  potential  complexity  problems  which  exist  in  creating  belief  networks 
(each  associated  with  different  levels  of  the  hierarchy),  we  must  also  ensure  that  the 
internal  structure  of  each  network  is  properly  designed  so  that  it  can  address  the  scaling 
issues  normally  associated  with  the  addition  of  new  target  features.  Examples  of  these 
two  issues  will  be  given. 

2.  Overview 

Ship  classification  involves  the  use  of  over  50  features  to  differentiate  between  tar¬ 
get  classes.  As  with  many  other  classification  problems,  often  several  target  classes  are 
very  similar  in  appearance  and  additionally  there  can  be  a  substantial  variation  of  each 
individual  target’s  specifics  within  the  same  target  class.  This  latter  characteristic  of  the 
problem  is  normally  caused  by  making  structural  modifications  or  the  addition  of  new 
weaponry  after  a  ship  has  been  deployed. 

Although  there  are  several  alternative  ways  in  which  to  decompose  the  large  prob¬ 
lem  into  a  hierarchical  solution,  we  have  endeavored  to  perform  this  operation  in  a 
manner  that  creates  partial  conclusions  that  have  an  intuitive  meaning  to  the  analyst. 
While  it  is  possible  (and  is  sometimes  appropriate)  to  create  sub-conclusions  in  our  tax¬ 
onomy  that  are  defined  by  the  separability  of  the  features,  it  makes  much  more  sense  to 
relate  the  taxonomy  categories  to  information  about  the  target  that  is  indicative  of  its 
mission  or  military  capability.  Figure  1  demonstrates  the  hierarchical  breakdown  of  the 
ship  classification  problem  from  detection  of  a  target  on  a  radar  PPI  down  to  the  target’s 
naval  class  designation. 

Mamucript  approved  June  14,  1993. 
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At  the  upper  levels  of  our  taxonomy  we  normally  assign  priors  which  reflect  the  fact 
that  each  possible  target  class  is  equally  likely.  This  allows  our  problem  solving  to  detect 
the  target  which  best  explains  the  observed  evidence.  Such  priors  are  not  absolute  and 
are  not  intended  to  indicate  the  actual  frequency  of  occurrence  of  each  target  type  in  the 
world.  Additionally,  our  priors  are  not  considered  to  be  Arm  and  fixed.  Instead  they  are 
expected  to  be  tailored  to  match  each  specific  mission  scenario  based  on  the  effects  of 
prior  intelligence  and  other  activity  associated  with  the  area  of  the  world  being  analyzed 
(i.e.,  in  the  Mediterranean  you  are  less  likely  to  come  across  a  Chinese  ship,  or  a  specific 
class  of  targets  may  be  known  to  be  in  port  and  under  repair).  These  "prior"  values  are 
expected  to  be  calculated  using  a  separate  belief  network  that  is  designed  to  fuse  such 
information  from  diverse  intelligence  reports. 

Belief  networks  are  associated  with  each  level  of  the  taxonomy  hierarchy.  Each  net¬ 
work  uses  appropriate  features  to  attempt  to  differentiate  between  the  hypotheses  at  that 
level.  If  the  observed  evidence  can  differentiate  between  the  candidate  hypotheses  at  the 
given  level  of  the  taxonomy  then  the  problem  solving  continues  by  loading  a  network 
associated  with  the  next  more  specific  level  of  the  taxonomy.  But,  if  the  evidence  yields 
an  inconclusive  result  then  the  problem  solving  is  suspended  and  the  most  specific  result 
obtained  from  the  taxonomy  is  returned  to  the  user.  This  approach  is  described  in  more 
detail  in  [Musman,  Chang  &  Booker  1993]. 

As  a  result  of  having  many  different  networks  associated  with  solving  the  problem, 
it  is  necessary  to  construct  the  networks  to  be  modular.  This  will  allow  us  to  use  the 
results  from  one  network  as  a  single  piece  of  evidence  in  another  network,  or  will  allow 
us  to  use  the  results  of  one  network  as  priors  in  another  network.  This  approach  has 
advantages  and  disadvantages.  An  advantage  of  this  approach  is  that  the  networks  loaded 
into  memory  tend  to  be  smaller  and  simpler  than  networks  which  address  the  whole  prob¬ 
lem.  As  a  result  it  is  possible  to  use  dynamically  computed  measures  of  informativeness 
to  efficiently  order  the  acquisition  of  evidence  [Pearl  1989]  [Musman,  Chang  &  Booker 
1993].  Some  properties  of  informativeness  measure  are  shown  in  Appendix  1.  The  pri¬ 
mary  disadvantage  of  this  modular  approach  is  that  the  use  of  such  informativeness 
measures  is  restricted  to  the  single  smaller  networks.  Thus,  the  ability  to  change  the  focus 
of  attention  is  much  more  limited  than  it  could  be.  There  is  a  trade-off  in  flexibility  vs 
computation,  that  must  be  considered  when  creating  the  network  modules. 

As  a  result  of  the  large  number  of  targets  which  must  be  modeled  by  our  belief  net¬ 
works,  if  one  were  to  build  a  network  (or  network  modules)  which  compare  each  target 
to  each  other  target  type,  there  would  be  a  large  amount  of  duplicate  structure  within  the 
network  associated  with  the  fact  that  the  targets  are  built  of  essentially  similar  com¬ 
ponents  (i.e.  all  targets  have  superstructure  mounted  on  the  deck,  etc.).  This  means  that 
every  time  a  target  is  added  or  deleted  from  such  a  network  it  is  necessary  to  add  or 
delete  all  of  the  supporting  "structural”  links  associated  with  that  target.  This  makes  such 
a  network  much  harder  to  maintain  (Figure  2). 
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To  circumvent  the  above  problem,  we  have  chosen  to  utilize  the  fact  that  all  ship 
targets  have  essentially  the  same  structural  makeup.  Thus,  the  main  difference  between 
the  targets  is  found  in  the  details  about  how  the  features  appear  on  the  targets’  com¬ 
ponents,  and  is  not  due  to  the  fact  that  some  targets  have  different  components  than  oth¬ 
ers  (i.e.,  all  targets  have  both  superstructure  and  masts  but  the  shapes  of  these  features 
are  different).  By  using  this  constraint,  it  is  possible  to  create  a  single  network  which 
represents  the  structural  makeup  of  a  target  (Figure  3).  We  then  use  this  network  by 
adding  evidence  which  indicates  the  errors  between  the  observed  evidence  and  a  single 
specific  target. 

In  order  to  use  the  network  shown  in  Figure  3  it  is  necessary  to  re-instantiate  the 
network  for  each  known  target  type  by  adding  evidence  which  represents  a  measure  of 
the  error  between  the  observed  evidence  and  the  expected  description  for  each  specific 
target  Thus,  evidence  is  added  to  the  network  comparing  the  observations  to  targetl,  the 
result  is  obtained,  and  then  the  process  is  repeated  for  each  known  target  type.  This  net¬ 
work  has  two  top  level  hypotheses:  Target  and  Other.  The  ’Target"  hypothesis  represents 
a  known  target  type  (which  is  perfectly  described  when  there  is  no  error  between  the 
observed  evidence  and  the  expected  description).  The  "Other"  hypothesis  represents  ran¬ 
domness  where  all  of  the  errors  are  equally  likely.  A  (TIO)  network  will  be  referred  to 
here  as  7',-module,  for  i’th  ship  class. 

The  advantage  of  using  this  approach  to  solving  the  ship  classification  problem  is 
that  we  now  obtain  two  pieces  of  information  about  each  type  of  target: 

1) .  We  obtain  a  measure  of  how  well  the  evidence  matches  the  specific  target  type. 

This  means  that  we  can  look  at  the  final  belief  of  the  network  to  understand  if  the 
specific  target  being  observed  has  been  modified,  or  is  somehow  different  from  our 
prototypical  example  stored  in  our  target  database  (i.e.,  this  makes  it  possible  to 
identify  an  unknown  ship). 

2) .  It  is  still  possible  to  compare  the  beliefs  in  targetl  vs  target2,  etc.,  to  produce  the 

probability  of  { target l,target2 targetN)  exactly  as  would  have  been  calculated  by 

the  network  shown  in  Figure  2. 

We  will  describe  more  details  about  using  this  approach  later  in  the  paper. 

3.  Network  Structure 

Under  most  conditions  it  is  appropriate  to  use  the  belief  at  each  node  to  compare 
candidate  targets.  There  are  however,  certain  conditions  which  can  cause  the  value  of 
belief  to  be  other  than  what  is  desired.  This  condition  occurs  when  there  can  be  ambi¬ 
guity  associated  with  a  single  observation.  When  such  ambiguity  exists,  the  target 
hypothesis  that  contains  the  largest  number  of  possible  alternatives  which  may  match  the 
observation  will  be  assigned  the  highest  belief.  While  this  assignment  of  belief  is  correct, 
it  is  often  found  that  the  analyst  is  assured  that  the  single  observation  can  only  match  one 
of  the  possible  outcomes  (i.e.,  by  being  confident  that  he  can  identify  the  observation  as 
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being  a  single  one  and  definitely  not  two  coincident  observations  that  may  appear  as 
one).  When  this  happens,  the  result  the  analyst  is  expecting  to  see  is  one  which  compares 
only  the  single  best  possible  explanation  for  each  hypothesis  versus  the  best  explanation 
for  each  other  hypothesis.  In  Bayesian  belief  networks  this  comparison  of  best  explana¬ 
tions  is  called  bel*  and  is  defined  in  [Pearl  1989]  as  follows: 

bel*  ( x )  =  max  p(x,wx  le) 

where  x  stands  for  a  variable,  e  stands  for  pieces  of  evidence  and  wx  stands  for  the 
instantiation  of  all  variables,  except  x,  on  the  belief  network.  Therefore,  to  find  the  best 
explanation  is  to  search  for  the  variables’  instantiations  that  maximize  probability  distri¬ 
butions. 

As  an  example  of  this  phenomenon  we  propose  the  following  example: 

It  is  often  possible  to  identify  ship  targets  at  night  by  noting  the  number  and  location 
of  portholes  along  the  length  of  the  target.  Each  porthole  location  is  noted  as  a  per¬ 
centage  location  along  the  length  of  the  target  where  the  bow  is  0%  and  the  stem 
represents  100%.  A  typical  measurement  can  be  made  in  10%  intervals. 

This  type  of  problem  is  hard  to  model  using  Bayesian  belief  networks  because  there 
are  multiple  causes  for  being  able  to  observe  a  porthole  along  the  length  of  the  ship.  Not 
only  must  a  porthole  be  present  at  a  specific  location  on  the  target  for  it  to  be  observable, 
but  also  the  lights  inside  of  the  porthole  must  be  illuminated.  This  means  that  for  a  given 
target  the  first  observed  porthole  location  on  any  given  night  may  be  the  first,  second,  or 
third  porthole  present  on  that  target,  and  so  on  for  the  remaining  portholes. 

The  easiest  way  in  which  to  model  this  problem  is  to  relate  the  number  of  lights 
observed  with  the  number  of  lights  on  the  target,  and  then  exhaustively  list  out  the  possi¬ 
ble  permutations  for  how  the  lights  on  the  target  may  be  illuminated.  This  has  the  com¬ 
binatorial  behavior  we  mentioned  earlier  (Figure  4). 

This  network  shown  in  Figure  4  is  interesting  because  it  demonstrates  the  behavior 
and  contrast  of  bel  (i.e.,  the  posterior  probability  evaluated  from  the  Bayesian  belief  net¬ 
work)  and  bel*  for  a  given  network  node.  When  given  an  observation  located  at  30% 
along  the  ships  length,  it  is  possible  that  this  one  observation  can  be  caused  by  either  the 
1st  or  2nd  porthole  on  target- 1,  but  could  only  be  caused  by  the  1st  porthole  on  target-2. 
No  porthole  on  target-3  can  possibly  match  this  observed  porthole.  If  this  single  piece  of 
evidence  is  entered  to  the  network  the  resultant  belief  at  the  Target  node  will  correctly 
reflect  the  fact  that  target- 1  is  two  times  more  likely  than  target-2  and  Target-3  is 
discounted  altogether  (i.e.,  bel={0.66,0.33,0.0}).  By  contrast,  the  value  of  bel*  reflects 
the  fact  that  only  one  of  the  alternatives  for  target- 1  can  possibly  match  the  single  obser¬ 
vation.  This  causes  the  bel*  result  to  equally  distribute  its  belief  between  target- 1  and 
target-2  (i.e.  bel*={0.5, 0.5, 0.0}). 

To  make  this  example  a  little  more  interesting  we  now  add  an  additional  constraint. 
It  states  that  at  night  at  a  long  distance  it  is  often  possible  to  confuse  an  open  deck  hatch 


for  a  porthole.  If  this  happens,  then  it  becomes  possible  to  have  the  very  first  light  obser¬ 
vation  actually  be  caused  by  an  open  deck  hatch  rather  than  the  1st,  2nd  or  3rd  porthole 
on  the  target  To  make  this  problem  tractable  we  limit  our  example  to  only  allow  the 
observation  of  one  incorrect  detection  (i.e.,  we  will  only  allow  one  deck  hatch  to  be 
observed). 

As  an  additional  constraint  to  the  above  problem,  we  will  now  allow  the  observation 
of  a  single  incorrect  porthole  location  (i.e.,  we  assume  that  an  incorrect  observation 
is  an  open  deck  hatch)  without  wanting  to  penalize  our  final  belief.  This  means  that 
a  single  correct  observation  should  yield  the  same  resultant  belief  as  observing  one 
correct  observation  and  one  incorrect  observation.  If  we  observe  two  incorrect 
features,  then  this  can  be  considered  to  be  a  non-coincidental  error  and  we  will 
expect  the  resultant  belief  to  exclude  any  target  which  has  more  than  one  incorrect 
observation. 

Figure  5  demonstrates  a  simple  network  that  produces  the  desired  result  when  the 
bel*  value  of  the  top  level  node  is  queried.  It  is  deigned  to  allow  up  to  three  observations 
but  allow  one  of  them  to  be  incorrect  (i.e.,  not  match  anything  on  a  target)  without  penal¬ 
izing  the  bel*  of  that  target.  It  is  worthwhile  to  examine  the  bel*  value  response  to  the 
piece  of  evidence  shown  in  Figure  6.  While  the  results  demonstrate  that  the  network 
appears  to  return  the  desired  results,  it  is  necessary  to  re-examine  the  structural  relation¬ 
ships  within  this  network  to  understand  its  scalability  characteristics.  For  a  simple 
demonstration  of  this,  take  the  problem  shown  above  as  an  example.  Let  the  possible 
observations  of  porthole-1,  porthole-2,  porthole-3  and  a  phony  object  (e.g.,  hatch)  be 
denoted  as  1,  2,  3  and  W,  respectively.  Exhaustively  listing  all  of  the  possible  outcomes 
yields  23  of  them: 

1,  2,  3,  W, 

12,  13,  23,  1 W,  2 W,  3W,  W 1,  W  2,  W  3, 

123,  12 W,  13 W,  23 W,  W 12,  W13,  W 23,  \W2,  IW3 , 2W3, 

where  one  false  detection  of  phony  objects  is  allowed.  This  number  is  on  the  order  of  n! 
(where  n  is  the  number  of  features  plus  the  number  of  false  alarms).  If  we  were  to  build  a 
similarly  structured  network  to  solve  a  problem  which  would  allow  6  observations  and  2 
false  detections  (which  is  more  commensurate  with  real  world  conditions)  then  we  would 
need  to  exhaustively  list  out  846  possible  outcomes!  This  becomes  impractical.  The  cal¬ 
culation  of  the  number  of  possible  outcomes  is  given  in  Appendix  2. 

To  overcome  this  problem  we  have  designed  a  different  network  structure  that  is 
intended  to  produce  the  same  bel*  as  the  above  network,  but  without  producing  the  scal¬ 
ing  characteristic  noted  above.  We  have  called  the  approach  Sequential  Decomposition 
(SD)  (Figure  7).  It  works  by  imposing  a  different  set  of  independence  assumptions  about 
the  observations  than  the  above  exhaustive  approach.  In  this  case  the  SD  approach  asso¬ 
ciates  each  observation  only  with  legitimate  outcomes.  SD  imposes  on  evidence  from 


subsequent  observations  the  constraints  obtained  from  understanding  the  preliminary  rea¬ 
soning  about  evidence  for  the  first  few  observations.  That  is,  constraints  are  explicitly 
represented  in  SD  structure.  As  a  result  of  this,  for  the  3-porthole,  1 -hatch  problem  we 
will  have  at  most  only  seven  possible  ways  of  explaining  observed  pieces  of  evidence. 
The  seven  possible  outcomes  are  {Wl,  2,  W2,  3,  W,  NO,  O)  after  two  observations, 
where  NO  denotes  a  constraint  violation  which  can  only  be  resolved  by  having  this 
observation  "not  observed"  (NO),  and  O  means  all  other  ship  classes.  The  conditional 
probability  of  an  evidence  node,  for  example,  O  \  given  2  has  the  same  value  as  O  \  given 
W2,  because  the  "W"  in  W2  simply  means  the  violation  of  constraints.  Also,  because  of 
the  meanings  of  O  and  W,  conditional  probabilities  of  evidence  nodes  given  O  and  W  are 
assumed  to  be  equally  distributed.  Note  that  3  stands  for  13  and  23.  Since  bel*  selects 
the  best  instantiation,  it’s  possible  to  compress  multiple  outcomes  into  one  outcome  and 
preserve  bel*.  This  property  of  bel*  leads  to  the  equivalence  relationship  between  the 
exhaustive  and  SD  networks  representations: 

Property  1.  The  values  of  Bel*(T,)  (Bel*(0))  computed  from  Exhaustive  and  SD  net¬ 
works  are  equal. 

That  is,  SD  has  desirable  scaling  properties  for  the  computation  of  bel*.  If  we  were  to 
build  a  network  to  solve  the  6  porthole  two  hatch  problem  then  we  would  only  need  16 
hypotheses  for  each  node.  This  number  is  much  better  than  the  846  hypotheses  required 
in  the  exhaustive  approach. 

While  this  new  network  is  designed  to  produce  the  same  value  of  bel*  for  any  given 
set  of  observations  (Figure  8),  it  is  important  to  note  that  the  bel  values  for  the  2  networks 
are  very  different.  This  is  because  the  independence  assumptions  for  the  evidence  in  each 
network  are  different.  Because  of  this,  different  ambiguities  exist  in  the  different  net¬ 
works  and  it  is  these  different  ambiguities  that  cause  the  bel  values  to  differ. 

4.  Integration  of  Belief  Values 

The  proposed  approach  is  designed  to  work  by  instantiating  a  single  network  which 
models  only  the  structure  of  each  target  and  utilizes  evidence  in  the  form  of  an  error 
measure.  A  separate  bel*  value  is  obtained  for  each  target.  The  main  advantage  of  using 
this  approach  is  that  it  is  very  easy  to  add  or  delete  targets  to  a  classification  system  using 
this  network  because  the  network  remains  unaltered.  The  evidence  added  to  this  network 
is  in  the  form  of  an  error  between  observation  and  specific  target,  and  these  error  meas¬ 
ures  are  computed  by  comparing  the  observed  evidence  to  feature  values  stored  in  a  data¬ 
base.  This  means  that  simple  adding  or  deleting  database  entries  for  targets  is  sufficient 
for  altering  the  number  of  targets  in  the  system. 

Computationally,  because  we  really  wish  to  enter  evidence  to  this  network  in  a  form 
which  compares  the  probability  that  the  observation  is  porthole- 1,  porthole-2,  etc.,  on 
each  target,  in  our  work  we  have  created  functions  which  compute  these  likelihoods  by 
comparing  the  observations  to  the  database  values.  In  doing  this  we  have  lost  some  of  the 


characteristic  benefits  of  using  bi-directional  inferences  but  have  gained  a  substantial 
computational  improvement. 

In  addition  to  the  scalability  advantages  associated  with  this  technique,  it  is  easier  to 
understand  and  analyze  the  behavior  of  the  network.  This  is  because  we  explicitly  model 
the  errors  associated  with  each  distinct  feature  type.  These  error  values  are  always  com¬ 
pared  with  a  random  distribution  (our  "Other”  hypothesis)  and  it  is  thus  much  easier  to 
ensure  that  one  feature  type  (or  evidence  source)  does  not  carry  more  weight  in  the  deci¬ 
sion  making  process  than  another  feature.  This  characteristic  can  often  be  a  significant 
problem  when  a  system  combines  evidence  from  a  variety  of  different  and  diverse 
sources. 

Given  a  set  of  ship  classes  {Tlt . .  T„),  the  final  decision  for  ship  classification  is 

based  on  integration  of  the  results  obtained  from  each  individual  module.  Recall  that, 
from  earlier  discussion,  our  decision  is  based  only  on  observed  evidence.  Therefore,  pri¬ 
ors  of  T,  and  O  are  assumed  to  be  equal  (though  this  technique  can  easily  deal  with 
unequal  priors)  in  all  T, -modules.  In  a  T, -module,  let  the  proportion  of 
bel*  (Ti) :  bet*  (O)  be  denoted  by  rt.  The  final  decision  of  T;  ’s  is  determined  by  compar¬ 
ing  those  77 ’s  In  fact,  if  bcl*(0)  remains  unchanged  in  different  7',-modules,  then  the 

ratio  of  r  1  : . rn  is  simply  bet*  (J\)\ . :  bet*  (T„),  which  is  exactly  the  ratio  without 

using  the  {TIO}  network  model  (i.e.,  all  target  classes  are  in  one  node).  This  property 
follows  from  the  fact  that  bel*(0)  (Top=0)  is  invariant  for  all  T, -modules.  The  only 
difference  between  Ti -modules  is  the  conditional  probabilities  of  evidence  nodes,  for 
example,  0 1  given  0 1  -Perms  node.  Thus,  in  the  presence  of  the  same  pieces  of  evi¬ 
dence,  to  show  the  difference  of  values  of  bel  *(0)’s,  one  only  needs  to  consider  the  prior 
of  Top=0  and  conditional  probabilities  of  evidence  nodes  given  0,-Perms  in  each  T,- 
module.  The  prior  of  Top=0  is  0.5  for  all  7", -module’s.  For  any  0;-Penns=O,  the  condi¬ 
tional  probability  of  the  evidence  node  O;  is  equally  distributed.  Hence,  the  value  of 
bel*(0)  does  not  change  for  any  given  module.  Therefore,  the  ratio  of  bel*  of  T,  is 
theme.  We  describe  this  result  in  the  following  Property: 

Property  2.  The  ratio  of  bel*  of  target  classes,  bel*  (Tt): . :  bel*  (7*„),  computed  with 

using  the  (TIO)  network  model  is  equal  to  that  computed  without  using  the  (TIO)  net¬ 
work  model. 

Proof.  This  property  follows  from  the  fact  that  bel*(0)  (Top=0)  is  invariant  for  all  7’,- 
modules.  The  only  difference  between  7',-modules  is  the  conditional  probabilities  of  evi¬ 
dence  nodes,  for  example,  0 1  given  0 1  -Perms  node.  Thus,  in  the  presence  of  the  same 
pieces  of  evidence,  to  show  the  difference  of  values  of  bel*(0)’s,  one  only  needs  to  con¬ 
sider  the  prior  of  Top=0  and  conditional  probabilities  of  evidence  nodes  given  <9, -Perms 
in  each  T, -module.  The  prior  of  Top=0  is  0.5  for  ail  7", -module’s.  For  any  0,-Perms=O, 
the  conditional  probability  of  the  evidence  node  Ot  is  equally  distributed,  i.e., 
p(e  1 1 O—Perms  =0)  -  ..p {e,  I  Oi-Perms  = O )..  =p(em  1 0  -Perms  =0), 
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where  tj  stands  for  j-th  value  of  Ot.  Hence,  the  value  of  bel*(0)  does  not  change  for  any 
given  module.  Therefore,  the  ratio  of  bel*  of  7",  is  the  same.  # 

Hypotheses  can  be  rejected  if  there  is  no  strong  supporting  evidence  for  them.  This 
fact  can  be  manifested  from  the  ratio  of  bel*’s  between  Tt  and  O.  That  is,  if  the  ratios  of 
7; -modules  are  smaller  than  1,  for  all  i,  then  a  statement  such  as  "Target  is  something 
else."  can  be  concluded. 

When  several  features  are  evaluated,  the  method  to  calculate  bel*  is  carried  out  by 
direct  multiplication.  The  evaluation  process  is  a  recursive  procedure  which  evaluates 
each  T(- module  in  turn. 

In  our  system  many  of  the  conditional  probability  links  contain  subjective  estimates 
of  actual  probability  distributions.  These  distributions  are  based  on  both  our  analysis  of 
the  results  of  a  limited  training  cycle  with  real  data  and  our  own  extrapolations  '  bout 
how  the  limited  training  results  may  extend  to  the  rest  of  the  problem  domain.  We  thus 
encourage  a  hybrid  data-driven  and  model-based  approach  to  estimating  the  conditional 
probability  links. 

When  estimating  our  conditional  probabilities  we  restrict  our  estimation  processes 
to  the  comparison  of  likelihoods  for  each  possible  hypothesis.  This  allows  us  the  oppor¬ 
tunity  to  better  compare  the  impact  of  evidence  applied  to  each  different  hypothesis  and 
thus  allows  us  to  compute  the  better  balance  of  evidential  weight  noted  above. 

5.  Conclusion 

As  with  the  previous  studies,  we  have  only  been  able  to  focus  on  a  small  portion  of 
the  ship  classification  problem.  By  combining  the  various  techniques  described 
separately  in  this  and  the  previous  papers,  it  is  possible  to  create  a  target  classification 
system  which  has  the  characteristics  required  for  the  ship  classification  problem. 

The  combined  techniques  have  been  tested  in  a  prototype  system  which  performed 
ship  classification  using  approximately  15  features,  for  over  200  targets.  While  using  15 
features  wasn’t  normally  enough  to  do  complete  classification,  the  final  target  ranking 
based  on  the  likelihood  measures  was  very  useful  as  a  decision  aid. 


Appendix  1:  Informativeness  Measure 

We  considered  three  alternatives  for  computing  measures  of  informativeness. 
Among  the  three  proposed  informativeness  criteria,  the  entropy  measure  is  recom¬ 
mended,  because  it  preserves  an  ordering  property  -  closer  nodes  are  more  informative 
than  farther  ones  with  respect  to  the  top  node  -  which  greatly  reduces  the  amount  of 
searching.  This  conclusion  results  from  the  following  two  facts. 

Lemma  1.  Let  T,  X  and  Y  be  three  nodes  in  the  chain  network,  where  T  is  the  top,  X  is 
the  descendant  of  T  and  Y  is  the  descendant  of  X  (Figure  9).  Then,  X  is  more  informa¬ 
tive  than  Y  with  respect  to  T  based  on  entropy  and  square-error  measure. 

Proof. 

•  entropy  measure 

For  entropy  measure,  the  following  inequality  holds: 

H(T\X)  =  H(T\X,Y)ZH(T\Y)  (Al) 

•  square-error  measure 

The  square-error  measure  has  the  following  property 

£  lTp(tlx)t]2p(x)  (A2) 

X  I 

=  II[£p(Hx)r]2p(xly)p(y)  (A3) 

y  *  » 

For  node  Y, 

£  [£p(Hy)r]2p(y)  (A4> 

y  i 

=  I,i'L'Lp(t'x')p(x'yy>t?piy)  (A5) 

y  *  * 

By  the  property  of  convexity,  i.e., 

(JJ(x)p(x\y))2  (A6) 

X 

<>  l/(x)2p(x\y) 

X 

it  is  easy  to  see  that  (A2)  is  greater  than  (A4). 

The  ordering  property  also  holds  for  entropy  measure  on  networks  containing  simple 
loop  structures. 

Lemma  2.  Let  T,  X,  Z  and  Y  be  four  nodes  on  a  loop  with  T  being  the  top,  X,  Z  being 
the  intermediate  and  Y  being  the  leaf  node,  respectively  (Figure  10).  Then,  X  is  more 
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informative  than  Y  with  respect  to  T  based  on  the  entropy  criterion,  i.e., 
H(T\X)£H(T\Y). 

Proof. 


H(T\X,Z)  =  H(T  \X£,Y)ZH(T\Y)  (A7) 

(Al)  implies 

H  (X,Z  I  T)+H  ( TY-H  {X,Z)ZH  (J  I Y)  (A8) 

Based  on  the  conditional  independent  relation,  (A8)  is  equal  to 

H (X  I T)+H (Z  I T)-H (X,Z)+H (T)  <M(T \Y)  (A9) 

Because  H  (X)+//(Z)>tf(X,Z),  (A9)  implies 

H(X  \T)+H(Z  \Ty-[H(X)+H(Z)]+H(T)£H(T\Y)  (A10) 


[H(X  I Ty~H(X)+H(T)]+{H(Z  I T}-H (Z)+H (T)]-H (T)  <&(T  I Y)  (All) 

From  (All),  either  [H(X \T)-H (X)+H (T)]  or  [H(Z \T}-H(Z)+H(T)]  must  be  less  than 
or  equal  to  H(T  I Y),  because  both  H(T  IX)  and  H(T  I Z)  are  greater  than  H{T).  Assum¬ 
ing  that  [H  (. X  I  Ty-H  (Y)]  is  less  than  H  (T I  Y)-H  (Y),  i.e., 

H(X\T)-H(X)+H(T)ZH(T\Y)  (A12) 

Thus,  by  (A12), 

H(T\X)ZH(T\Y)  (A13) 

The  above  result  can  be  easily  extended  to  any  number  of  intermediate  nodes. 

Appendix  2:  The  Complexity  of  Representing  612  Problems 

The  number  of  possible  outcomes  for  6  observations  with  tolerance  of  2  false  detec¬ 
tions  is  846.  Let  W  denote  the  false  detection.  This  value  is  obtained  as  follows: 

Case  l.#(W)=0. 

If  there  is  no  false  detection,  the  number  of  outcomes  is  63,  i.e., 

C6  j  +  C6  2  +  C6  3  +  C6  4  +  C6  5  +  C66. 

Case  2.  #(W)=1. 

The  outcomes  for  one  false  detection  are  equal  to  249,  i.e., 

i+2nc‘i  +3ncV4ncs  + 

sncVerKV 


Case  3.  #(W)=2. 


In  the  case  of  two  false  detections,  the  number  is  534,  i.e., 

i + (c22 + 2)  n  c6i + (c3  2 + 3)  n  c*  2  + 

(c42 + 4)  n  c63 + (c5  2 + 5)  n  cv 

Summation  of  the  three  values  yields  846.  The  maximum  outcomes  associated  with  a 
single  node  in  SD  does  not  exceed  16.  For  instance,  03-Perms  contains  the  following 
outcomes: 

w,  vw  1,  w 2,  wwl,  3,  w3,  ww 3,  ,4,  w4,  ww4,  5,  w5,  ww 5,  6,  w6,  ww 6. 
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Figure  1:  Ship  Classification  Hierarchy 
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Figure  2:  This  figure  shows  a  simple  network  with  3  targets  and  2  features. 
Note  that  the  representation  of  relationships  between  target  and  features 
does  not  allow  dynamic  addition/removal  of  a  target  (i.eM  a  ship  class)  from 
the  network. 
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Figure  3:  This  figure  shows  a  {T|0}  network  for  the  same  features  as  in  Fig¬ 
ure  2. 
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Possible  Interpretations  of  Observing  2  Lights: 

P1.P2 

P1.P3 

P2.P3 

P1.H1 

H1J>2 

H1.P3 


Figure  4:  This  figure  illustrates  the  exhaustive  porthole  solution.  The 
target  is  assumed  to  have  3  portholes  (PI,P2,P3)  which  may  be  illumi¬ 
nated.  In  addition  to  the  portholes,  this  target  also  has  an  open  hatch 
(HI)  which  may  appear  to  be  like  a  porthole  from  a  distance.  The  possi 
ble  interpretations  for  observing  2  lights  on  the  target  are  listed  above. 
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{U,3,W,12,13,23,1W,2W,3W,W1,W2,W3, 

{T,  0}  123,12W,13W,23W,W13,W23,1W2,1W3,2W3} 


For  Ol  the  hypotheses  is:  { 109b ,20%, 30% 90% } 

For  02,03  the  hypotheses  are  :  { 10%,20%,30% 90%,NOT-OBS } 

Figure  S:  This  Figure  illustrates  the  exhaustive  3W  Network  which 
allows  3  observations  and  assumes  the  possibility  of  a  single  incorrect 
observation. 


If  we  observe  a  porthole  which  appears  to  be  located  about  20%  of  the  length  of  the  Tar 
get,  our  evidence  might  be  the  following  likelihood  ratios: 

{  5:20:5:1:1:1:1:1:1  } 

The  resultant  Bel*  value  for  this  evidence  would  be: 

Bet*  (T)  =  0.778 
Bel*(0)  =  0.222 


ft 


ft 


ft 


Figure  6:  An  example  bel*  value  obtained  from  adding  evidence  to  the 
exhaustive-3W  network  shown  in  Figure  5 
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{T,0} 
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Figure  7:  This  figure  illustrates  the  SD  network  for  the  3W  probem 
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With  the  same  piece  of  evidence  given  in  Figure  6,  i.e., 

{  5:20:5:1:1:1:1:1:1  )  1 

the  Bel*  evaluated  from  SD  network  would  be: 

BeI*(T)  =  0.TJ&  ^ 

Bel*(0)  =  0.222 

Results  in  Figure  6  and  Figure  8  show  that  Bel*(T)  and  Bel*(0)  obtained  from  SD  and 
exhaustive  networks  are  identical. 

Figure  8:  Bel*  value  as  evaluated  from  the  SD-3W  network  when  given  the 
same  evidence  as  in  Figure  6 
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